### Defects and Faults in Emerging Integrated Circuit Technologies

#### J. A. Abraham

#### The University of Texas at Austin

64<sup>th</sup> Meeting of the IFIP 10.4 Working Group on Dependable Computing and Fault Tolerance Visegrád, Hungary June 29, 2013

### The Integrated Circuit (IC)

- 1958: Jack Kilby, working at TI, dreams up the idea of a monolithic "integrated circuit" (IC)
- 1959: Robert Noyce, at Fairchild, independently develops the IC, solving many practical problems
- 2000: Kilby receives Nobel Prize in Physics (Noyce was no longer alive)

Diagram from Kilby patent application





### Moore's Original Graph



Moore's Law is somewhat self-fulfilling: process engineers and equipment manufacturers use it as a target

#### Leaders in industry see a path to 7nm technology

University of Texas at Austin

Defects and Faults in Emerging IC Technologies

### Moore's Law

Transistor counts have doubled every 26 months for the past three decades



# Exponential Rate of Emerging Technology (for 110 Years)



University of Texas at Austin

Defects and Faults in Emerging IC Technologies

### So what are the challenges we are now facing?

#### Defects

- Hardware: manufacturing, wearout
- Design: bugs

#### Faults

- Hardware: process-related, environmental
- Software: bugs
- System: external attacks, intrusions

# Dealing with these would require a combination of fault avoidance and fault tolerance

University of Texas at Austin

### Defects in Nanoscale Technologies

Variations in devices due to subwavelength lithography, random dopant fluctuations, etc.

Experiments on real chips

• Some tests for logic-level "stuck-at" faults do not detect defects unless they are applied at speed

Interconnect opens are resistive (not complete breaks)

- example: Cu interconnect with barrier materials
- effect: delay faults in digital circuits
- analog and RF behavior?



### Defect Tolerance in Memories

Spare rows/columns for tolerating defects are common in current memory structures

Example of application in FPGAs



#### Source: Altera

### Defect Tolerance in Carbon Nanotube Circuits



#### Source: Mitra, 2005

### Variations in Nanoscale Technologies



### Features Smaller than Wavelengths

#### What is drawn is not what is printed on silicon



#### Source: Raul Camposano, Synopsys

### **Optical Proximity Correction (OPC)**

#### What you see is NOT what you get



### Imperfect Process Control

- Neighboring shapes interfere with the desired shape at some location: results in pattern sensitivity
- This is predominantly in the same plane
- There will be some interference from buried features for interconnect



#### Source: T. Brunner, ICP 2003

### Line Edge Roughness

- In the lithography process, dose of photons will fluctuate due to finite quanta
  - Shot noise
- There will be fluctuations in the photon absorption positions
  - Due to nanoscale impurities in the resist composition

- Poly lines subject to increasing line edge roughness (LER)
  - Impact: circuit delay and leakage power



### Random Dopant Fluctuations



### Fluctuation in Gate Oxide Thickness

- Gate oxide variations have an exponential effect on gate tunneling currents
- Impact on device threshold, but significantly less  $V_t$  variation than due to random dopant fluctuations
- Recent advances in high-k gate dielectrics (Hafnium oxides) with metal gates have alleviated this problem



### Variability due to Back-End Processing



- Chemical/Mechanical Polishing (CMP)
- Introduces large systematic intra-layer interconnect thickness
- Additional inter-layer interconnect thickness effects as well
- Topography variations result in focus variation for lines – leading to width variations



University of Texas at Austin

### **Dynamic Temperature Variations**

Thermal Map – 1.5 GHz Itanium Chip





### **Dynamic Voltage and Power Variations**

#### Voltage variations



Source: D. Hathaway, SLIP 2005





Source: Naffziger et al, JSSC 2006

### Effect of Variations on Circuit Performance

PSROs relative to reticle mean 05131SEA005.008



Source: Anne Gattiker, IBM

- Ring oscillators used for performance monitoring
- Variations of 11% slower to 13% faster than mean on the same die

### Variation Effects in Real Chips



Source: Kevin Nowka, IBM

- Multicore chip from IBM
  - $\bullet\,$  Core-0 was found to be  $\approx 15\%$  slower than other parts
- Models predicted that all parts of the design are identical

### Variation in Other Circuit Elements

Normalized capacitance distribution on a single layer



Source: C. Visweswariah

- This enormous variation has a significant impact on analog/RF design
- Industry "sweet spots" for analog design are  $0.25\mu - 0/18\mu$

exas at Austin

### Statistical Static Timing Analysis (SSTA)

- Determine the circuit timing from the delays of components
- Path-based SSTA

University of Texas at

- Select representative set of critical paths from normal (static) timing analysis
- Model the delay of each path as a function of random variables (the underlying sources of variation)
- Predict the parametric yield curve, as well as generate diagnostics
- Generate set of path delay tests for manufacturing screen



### Effects of Defects and Variations

#### Resistive Opens, Crosstalk, etc.

- Could affect delays of paths
- Delays are cumulative

#### Dealing with Small Delay Defects

- Need to detect distributed delays along a path
- Detecting small delays is a specification test, like analog/mixed-signal test
- Digital testing is moving towards analog need to test parameters for specifications!

### RAZOR – Aggressive Voltage Overscaling

- Error-tolerant dynamic voltage scaling (DVS) technology which eliminates the need for the voltage margins required for "always correct" circuit operations design
- A different value in the shadow latch shows timing errors
- Pipeline state is recovered after timing-error detection
- Error detection is done at the circuit level
  - The design overhead is large if timing paths are well balanced in the design



### Application-Level Checks for Aggressive Voltage Overscaling

- Discrete wavelet lifting transform in JPEG 2000
- Use of checksum technique
- Novel definition of "error" Signal-to-Noise Ratio
- Strong correlation between checksum and SNR



### Image 4.2.04 under VOS



## (a) 1.2V,SNR=37.3 (b) (c) 0.91V,SNR=36.8 0.96V,SNR=37.3



(d) <u>0 8477 SNR - 1 01</u> Defects and Faults in Emerging IC Technologies

### On-Chip Sensors for Performance Monitoring Example: RF Built-In Test using Amplitude Detectors 0.18µ CMOS technology



10 MHz output from sensors used to predict specifications Receiver:  $0.5 \times 1.2 mm^2$ Detector:  $0.06 \times 0.072 mm^2$ Area overhead: 1.4%



#### Relative error less than 5%



- We will have to learn to live with new defects and process variations
- Can try to design so that their effects are minimized
- New directions in adaptive operation to mitigate the effects of variations include high-level algorithmic checks and on-chip sensors
- Robust Design and Resilient Operation